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Preface 


The  design  of  Harp  was  completed  in  1975;  construction  of  a 
prototype  proceeded  far  enough  to  validate  the  engineering 
aspects  of  the  design  and  assure  us  that  the  speed  goals  had  been 
met,  but  not  to  a complete  running  Harp  system.  We  nevertheless 
feel  that  this  description  of  its  design  and  philosophy  and  the 
performance  results  of  careful  simulation  may  be  of  interest  to 
others. 

R.R.  & S.S. 
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I.  Introduction 

The  real-time  operation  of  algorithms  that  arise  in  speech  and  vision  research  is 
often  limited  by  the  speed  of  the  low-level  signal  handling  algorithms,  which  in  turn 
are  typically  limited  by  the  computation  time  of  a critical  inner  loop.  To  speed  up  the 
execution  times  of  these  algorithms,  dedicated  special  purpose  hardware  or  very  fast 
auxilliary  processors  have  been  used  to  implement  the  critical  code.  In  signal 
processing  problems,  and  especially  with  the  FFT,  good  results  have  been  obtained 
with  a special  machine  architecture  relying  heavily  on  instruction-cycle  overlap 
(pipelining)  and  on  multiple  parallel  arithmetic  units  capable  of  performing  certain 
complex  operations,  notably  the  FFT  "butterfly"  multiply,  very  efficiently  [FDP]  [SPS- 
41]  [AP-120B].  These  processors  are  almost  invariably  both  expensive  to  build  and 
difficult  to  program,  due  to  their  complex  structure  and  parallelism.  Their 
computational  power  on  suitable  tasks  has  nevertheless  made  them  useful. 

At  Carnegie-Mellon  University  we  are  investigating  artificial  intelligence 
algorithms  which,  although  they  deal  directly  with  input  or  generated  signals,  do  not 
fall  entirely  within  the  realm  of  conventional  "signal  processing".  Floating  point  and 
complex  arithmetic  are  not  usually  required,  and  while  the  FFT  is  used  in  some 
problems,  most  algorithms  cannot  consistently  utilize  such  special  features  as 
butterfly-multiply  hardware  efficiently.  Low  precision  integers  may  be  used  for  most 
computations:  often  only  small  bit  fields  are  manipulated,  as  in  vision  tasks  where 
pixels  may  be  as  small  as  4 bits.  When  signal  processing  is  done,  multiple  precision  is 
as  satisfactory  as  floating  point.  &)nsiderable  logical  manipulation  and  decision-making 
capability  are  also  called  for.  We  have  designed  a new  processor.  Harp,  which  suits 
the  needs  of  our  research  and  is  considerably  more  general-purpose  in  its  orientation 
than  the  signal  processors. 
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Harp  has  the  following  design  goals,  some  of  which  are  in  Keeping  with  the 
signal-processor  approach,  some  not: 

A.  Extremely  high  speed  (under  40  ns  cycle). 

B.  Low  cost  ($5000-$10,000  parts). 

C.  Programming  simplicity:  transparent  pipeline  and  parallelism. 

D.  Large  number  of  high-speed  registers. 

E.  Separate  instruction  and  data  memories  for  overlapped  access. 

F.  General-purpose  capability  (minicomputer-like  instruction  repertoire). 

G.  16-bit  data  and  instructions  for  minicomputer  compatibility. 

H.  Large  high  speed  second-level  memory  (4K  - 64K). 

I.  Extensive  diagnostic  capability. 

Software  tools  were  used  extensively  during  the  design  of  Harp.  A simulator, 
debugger,  and  assembler,  were  built  very  early  in  the  design  phase  of  each  proposed 
architecture.  These  tools  were  used  to  code  8 representative  algorithms  (summarized 
in  Section  VII)  to  give  the  designers  programming  experience  and  performance 
information.  A number  of  complete  design  iterations  resulted  before  Harp  was 
committed  to  hardware. 

II.  Architecture 

Harp  is  an  auxiliary  processor  to  a host  POP-11.  It  is  a "pure"  16-bit  machine: 
instructions,  data  paths,  and  ALU  are  all  16  bits  wide;  there  are  no  "byte"  instructions 
or  packed  data  representations. 

The  Harp  processor  operates  from  two  small,  very  high  speed  memories—the 
data  and  instruction  working  stores.  There  are  no  data  registers  or  accumulators  in 
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UNIBUS 


Figure  1:  PMS  Diagram  of  Haro 


the  conventional  sense.  The  data  working  stores  have  a 13  ns  cycle  time,  providing 
the  processor  with  a 1.2  Gbps  (bljlion  bits  per  second)  data  bandwidth. 

The  working  store  sizes  --  64  words  of  data,  64  to  256  words  of  instructions  — 
were  determined  by  the  number  of  address  bits  available  in  an  instruction  and  by 
physical  size  limitations  caused  by  signal  propagation  delays.  Because  Harp  is 
designed  to  run  small  code  fragments  these  small  working  stores  are  usually 
satisfactory.  The  enforced  separation  of  data  from  instructions,  though  it  seldom 
occurs  in  general-purpose  machines,  does  not  impact  programming  style,  and  gains 
considerable  speed  by  overlapping  code  and  data  memory  accesses. 

The  processor  itself  operates  in  the  usual  pipelined  instruction  overlap  fashion 
(see  Figure  2).  Instruction  execution  is  divided  into  four  stages:  1)  instruction  fetch 
2)  operand  fetch  3)  execution  4)  result  store.  Total  execution  time  is  133  ns,  with  the 
four  stages  overlapped  to  permit  a new  instruction  fetch  every  40  ns.  Thus  the 
Instruction  rate  is  25  Mips  (million  instructions  per  second). 
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Figure  2:  Haro  Pipeline  Timing 

Cost  and  generality  constrained  the  design  of  the  Harp  arithmetic  unit.  It  is 
limited  to  a 16  bit  two’s  complement  general  purpose  function  generator  and  a single, 
separate  (and  slower)  multiplier.  The  algorithms  listed  in  Section  VII  show  that 
execution  speed  is  not  limited  by  the  instruction  set. 

The  working  store  contents  can  be  transferred  to  or  from  a large  buffer  memory 
via  a block  transfer  mechanism.  The  transfer  rate  of  0.8  Gbps  (20  ns  per  word)  avoids 
bottlenecking  the  fast  processing  of  Harp.  Buffer  memory  is  expandable  from  4K  to 
64K  and  is  double-ported,  one  port  permitting  high  speed  transfers  to  the  Harp 
working  stores,  the  other  compatible  with  a PDP-11  UNIBUS. 

No  single-word  direct  access  to  buffer  memory  is  provided  for  several  reasons: 

1)  this  memory  is  interleaved,  and  the  delay  for  initiating  access  is  five  times  greater 
than  the  average  transfer  time;  2)  allowing  access  an  a cycle -to-cycle  basis  would 
have  slowed  Harp’s  execution  rate  considerably;  3)  two  words  of  working  store  are 
taken  up  by  addresses  for  each  reference  to  buffer  memory. 
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The  block  transfer  machanism  is  used  to  overlay  programs  too  large  to  fit  in  the 
available  instruction  memory.  Block  transfers  are  fast  enough  to  make  frequent 
overlays  acceptable  in  many  situations:  a 64-word  transfer  takes  less  than  1.5  us. 

Since  Harp  is  intended  to  operate  in  conjunction  with  a host  computer,  no  input 
or  output  capability  is  provided  other  than  the  buffer  memory  connection  to  the 
UNIBUS.  The  UNIBUS  bandwidth  is  sufficient  for  most  I/O  to  real-time  devices  and 
mass  storage.  Since  the  relatively  slow  PDP-11  coordinates  these  transfers  to  the 
buffer  memory,  no  hardware  interrupt  capability  is  included  in  Harp. 

III.  Instruction  Set 

Two-address  instructions  are  advantageous  in  high  speed  processing,  since  one 
such  instruction  can  accomplish  as  much  as  two  or  three  single-address  instructions. 
Including  two-address  or  "2 -op"  instructions  in  the  16  bit  Harp  instruction  words, 
however,  forced  a tightly-packed  coding  of  address  and  opcode  fields.  Careful 
consideration  of  the  algorithms  to  be  implemented  on  Harp  led  to  two  helpful 
observations:  1)  only  a few  kinds  of  double  operand  instructions  are  used  in  a given 
loop,  although  the  instructions  themselves  vary  from  algorithm  to  algorithm  (see  details 
in  Figure  4)j  2)  indexed  addressing  with  some  kind  of  auto-incrementation  is  very 
useful  for  the  vector -oriented  operations.  These  considerations  led  to  the 
unconventional  instruction  format  shown  in  Figure  3. 

The  upper  two  bits  of  the  instruction  word,  if  nonzero,  indicate  a two-op 
instruction.  The  three  possible  nonzero  combinations  select  one  of  three  op-code 
registers:  OPA,  OPB  or  OPC.  The  contents  of  the  selected  register  then  determines 
the  particular  two-op  instruction  from  among  19  implemented  functions.  Thus  a 
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program  is  limited  to  only  three  double-op  instruction  types  at  a time,  but  the 
programmer  can  select  which  three  by  loading  the  op-code  registers. 

The  operand  address  fields  contain  7 bits:  6 to  directly  address  the  64  locations 
in  the  data  working  store,  and  one  addressing  mode  bit.  The  mode  bit  selects  one  of  a 
pair  of  available  modes  as  determined  by  the  AORM  register  (see  Appendix  B).  There 
are  two  index  registers,  XI  and  X2,  with  auto-increment  and  auto-decrement  capability. 
These  registers  may  also  be  modified  by  some  of  the  branch  instructions  for  simplified 
loop  control. 

If  the  upper  instruction  word  bits  are  zero.  Harp  decodes  the  instruction  to 
determine  whether  it  is  a single-op,  branch,  register  load/store,  or  block  transfer.  The 
16  single-op  instructions  are  similar  to  PDP-11  instructions.  Arithmetic  and  logical 
operations  set  the  N,  Z,  C,  and  V bits  in  the  PS  register  which  are  used  by  the 
conditional  branch  instructions,  again  following  the  PDP-11  conventions.  There  are 
instructions  for  loading  and  storing  the  8 state  registers  (OP A,  OPB,  OPC,  ADRM,  XI,  X2, 
PS,  PC)  including  a literal  toad. 


The  block  transfer  instructions  move  data  between  the  buffer  memory  and  the 
working  stores  and  optionally  interrupt  the  PDP-11.  A two  word  descriptor  in  working 
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store  provides  a buffer  memory  start  address,  a working  store  start  address,  and  a 
block  length. 

A special  set  of  instructions  dispose  of  the  output  of  the  separate  multiplication 
processor,  which  performs  16x16  bit  multiplications  and  retains  the  32-bit  product. 
Instructions  are  provided  to  store  the  two  halves  of  this  result  in  data  memory  and  to 
add  them  to  data  memory  (to  accumulate  a double-precision  inner  product).  The 
multiplication  processor  operates  in  parallel  with  the  central  processor  but  receives 
instructions  and  operands  from  it.  The  16x16  multiply  takes  80  ns  to  complete,  so  a 
program  must  execute  at  least  one  instruction  after  the  MUL  before  accessing  the 
product. 

Though  Harp  is  a pipelined  machine,  the  pipe  is  invisible  to  all  instructions 
except  explicit  state-register  references.  Branch  lookahead  hardware  avoids  delay 
while  allowing  natural  instruction  sequencing.  Programmers  need  not  be  concerned 
with  such  complications  as  "clearing  the  pipe  on  a branch"  which  plague  most  signal 
processors. 

IV.  Engineering  Considerations 

Harp  is  implemented  with  lOK  series  ECL  integrated  circuits  in  the  processors, 
working  stores  and  their  control  [MECL].  Approximately  400  ECL  chips  are  mounted  on 
8 double-sided  9x12  inch  PC  boards.  The  "backplane"  is  square  shaped  instead  of 
flat,  with  sockets  for  two  PC  boards  on  each  of  the  four  sides;  the  boards  plug  into 
this  "core”  in  a cross  configuration  so  that  the  component  side 'of  each  board  is 
accessible  at  all  times.  This  radial  construction  technique  keeps  inter-board 
communication  delay  very  short,  and  during  servicing  provides  access  to  all  chips 
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simultaneously  without  the  need  for  extender  boards.  Although  the  cross  consumes 
considerable  packaging  volume,  it  is  rackmountable  at  17x  17x  12  inches. 

Double-sided  PC  construction  was  chosen  over  wire-wrap  because  of  cost, 
signal  fidelity,  and  reproducibility.  Multilayer  PC,  with  its  higher  prototype  turn- 
around time  and  expense  is  not  justified  by  the  present  circuit  density. 

Memory  chips  of  sufficient  size  and  speed  for  the  buffer  memory  are  presently 
least  expensive  in  TTL,  so  this  memory  and  the  PDP-ll  interface  are  implemented  with 
TTL  The  buffer  memory  chips  and  their  control  circuitry  are  mounted  on  PC  boards 
adjoining  the  cross.  There  is  room  to  expand  the  memory  to  64K  words  (16  boards) 
while  keeping  the  memory  bus  length  to  the  ECL  converters  under  12  inches.  Cables 
connect  to  the  PDP-11  interface  and  control  circuitry.  This  100  chip  TTL  circuit  is 
wire-wrapped  in  the  prototype. 

The  prospect  of  having  several  Harps  in  our  environment  encouraged  diagnostic 
capability  to  ease  the  hardware  maintance  burden.  Since  most  of  the  machine  failures 
are  expected  to  be  hard  static  faults  rather  than  high  speed  timing  problems, 
considerable  work  has  gone  into  a hardware  diagnostic  system  and  its  software 
support  package.  All  of  these  involve  the  POP-11  as  an  interrogater  and  data 
collector.  The  machine  clock  may  be  stopped  and  Harp  may  be  single-stepped  by  PDP- 
ll  software.  Between  cycles,  all  registers  and  inter-board  data  and  control  signals 
may  be  examined  by  the  PDP-11  without  affecting  the  state  of  Harp;  the  working 
stores  can  be  accesed  through  the  Harp  processor  while  Harp  is  halted. 

Lower-level  diagnostics  exercise  the  buffer  memory,  working  stores,  processor, 
and  control  at  levels  sucessively  deeper  within  Harp.  Combined  with  the  accessibility 
of  inter-board  signals,  these  should  allow  location  of  faults  at  the  register  level. 
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V.  Software 

Harp  is  fully  supported  by  an  extensive  collection  of  system  software,  iricluding 
a symbolic  assembler,  an  unusual  graphics-based  debugging  package,  a complete 
simulator,  and  a set  of  PDF-11  routines  for  Harp  control. 

The  Harp  display-oriented  debugger  runs  on  the  PDP-11  and  uses  a graphic 
display  terminal  to  maintain  a picture  of  the  instruction  memory  (decoded),  the  data 
memory,  and  the  state  registers;  it  has  facilities  for  tracing  programs  and  modifying 
memory  in  Harp.  The  debugger  can  be  used  with  either  the  simulator  or  the  actual 
hardware. 

A package  of  BLISS-11  routines  and  macros  allows  the  PDP-11  program  to 
access  all  Harp  memories  and  to  control  execution  of  Harp.  These  routines  are  the 
basic  facility  for  inner  loop  execution  on  Harp. 

VI.  General-Purpose  Capabilities 

The  designers  of  Harp  began  with  a view  of  creating  a "functionally  specialized 
architecture”  for  the  problems  encountered  in  speech  and  vision  research.  As  we 
progressed  it  became  clear  that  the  only  relevant  common  characteristics  of  the  tasks 
are  1)  fairly  large  amount  of  processing  on  each  datum,  and  2)  small  items,  usually  4 to 
12  bits  of  precision.  The  first  observation  is  expressed  in  the  size  of  data  working 
store  — 64  words  is  large  if  considered  as  "registers",  but  small  for  a "main  memory"; 
as  "working  store",  it  is  well  matched  to  Harp’s  tasks.  The  second  led  to  the  choice  of 
small  (16  bit)  integers  as  the  data  type.  The  resulting  design  looks  not  at  all  like  a 
"specialized"  processor,  but  more  like  a clean  vertically  microcoded  minicomputer  with 
residual  control.  Notably,  at  least  one  recent  signal  processor  design  is  billed  by  its 
builders  as  basically  a "high  performance  minicomputer"  [LDVT]. 
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We  expect  Harp  to  be  so  cost-effective  that  it  will  be  attractive  whenever  more 
processing  power  is  needed  on  a PDP-11  system  or  similar  minicomputer. 


VII.  Performance 


The  simulated  performance  of  Harp  on  8 representative  tasks  is  shown  in  Figure 
5,  compared  to  the  performance  of  two  conventional  computers  on  the  same  tasks. 
The  effectiveness  of  the  Harp  architecture  and  instruction  set  design  can  be  seen  in 


Figure  4 which  shows  frequencies  of  use  of  various  machine  features. 
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Figure  4:  Haro  Statistics 
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Algorithm 

Histogramming: 

Dragon  (Speech  Understanding); 
Butterfly  Multiply  for  FFT: 

Template  Matching  (14x40  Multiply): 
Autocorrelation  with  Hamming  Window: 
Edge  Suppression: 

Picture  Smoothing  or  Texture: 

Matrix  Multiplication  for  3-D  Graphics: 
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PDP-10  SAIL 

PDP-1 1 ASM 

HARP 

7.62 

5.99 

0.172 

12.39 

9.84 

0.219 

1.49 

2.50 

0.029 

25.04 

17.96 

0.114 

126.36 

130.79 

1.054 

14.93 

6.73 

0.178 

20.66 

9.11 

0.103 

5.97 

5.32 

0.018 

Figure  5:  Comparative  Times 


Histogramming  is  an  image  understanding  algorithm  that  examines  an  entire 
picture  and  yields  the  total  number  of  points  at  each  intensity  level.  Two  data  types 
are  used:  picture  elements  which  are  8 bits,  and  256  "buckets'*  for  counting  picture 
elements.  The  buckets  must  be  at  least  19  bits  for  pictures  which  are  480  by  640 
points. 

Dragon  is  a speech  recognition  system  that  uses  probability  networks  to 
understand  speech.  The  inner  loop  involves  calculating  the  probabilities  of 
transitioning  through  the  network  at  each  time  interval.  All  values  are  represented  as 
logarithms  so  that  no  multiplication  is  necessary.  Sixteen-bit  precision  is  adequate 
throughout  all  calculations. 

Template  matching,  a speech  understanding  algorithm,  compares  the  parametric 
representations  of  two  signals.  It  multiplies  a 1x14  array  with  a 14x40  array  to  give  a 
1x40  array.  Maximum  input  precision  is  12  bits,  so  28  bits  are  adequate  for 
calculations  and  output. 

The  graphics  algorithm  Box  Fitting  retrieves  points  in  a sparsely  represented 
three-dimensional  space.  The  space  is  broken  down  into  as  many  as  8 arbitrarily 
rotated  and  located  rectangular  boxes,  each  completely  described  by  a 4x4 
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transformation  matrix.  Given  a point  in  3-space,  the  list  of  boxes  is  searched  to  find 
out  whether  the  point  is  in  a box.  If  successful,  a search  is  done  on  8 sub-boxes,  etc. 
The  3-space  points  are  10  bits  and  the  transformation  matrices  16  bits  but  the 
intermediate  results  need  26  bits. 

Autocorrelation  is  a speech  algorithm  for  examining  waveforms.  The  Harp 
algorithm  performs  a Hamming  window  normalization  on  the  input  data  prior  to  the 
autocorrelation.  This  consists  of  multiplying  12-bit  signal  values  by  a window  function 
to  give  a 12-bit  signal.  The  algorithm  produces  correlation  figures  for  15  different 
period  intervals  along  an  input  waveform.  The  output  values,  multiplied  and  summed 
across  200  samples,  must  be  at  least  32  bits. 

Smoothing  is  an  image  understanding  algorithm  that  examines  every  point  in  a 
picture  and  smoothes  it  in  relation  to  its  context.  The  8 points  surrounding  the  point 
to  be  smoothed  are  summed;  if  the  sum  exceeds  a threshold,  the  point  is  smoothed  out. 
Another  image  understanding  algorithm  is  Edge  Suppression,  used  for  thinning  out  the 
edges  in  a picture  by  removing  extraneous  picture  elements  on  an  edge.  Each  scan  line 
is  examined,  and  only  local  maximum  points  are  retained.  Both  of  these  image 
algorithms  work  on  picture  elements  of  up  to  8 bits  in  size. 

Fast  Fourier  Transform  is  the  standard  signal  processing  algorithm  that 
transforms  from  a time  domain  to  a frequency  domain.  This  particular  implementation 
does  a bit-reversal  (re-organizing  of  data)  and  a transform  of  1024  complex  values 
using  16  bit  integers. 
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Mnemonic  Descriotion 

Two-Od  Instructions 

ADD 

Add 

ADDC 

Add  with  carry 

SUB 

Subtract 

SUBC 

Subtract  with  carry 

RSUB 

Reverse  subtract 

RSUBC 

Reverse  subtract  with  carry 

MUL 

Multiply 

MOV 

Move 

MOVN 

Move  negative 

AND 

Logical  AND 

BIS 

Logical  OR 

NAND 

Logical  NANO  (Not-And) 

NOR 

Logical  NOR  (Not-Or) 

XOR 

Logical  Exclusive  Or 

EQV 

Logical  Equivalence 

BIG 

Bit  Clear 

IMP 

Logical  Implication 

BIT 

Test  bits 

CMP 

Compare 

One-Oo  Instructions 

NOP 

No  operation 

HALT! 

Halt  and  interrupt 

TST 

Test 

CLR 

Clear 

COM 

Complement 

NEG 

Negate 

INC 

Increment 

DEC 

Decrement 

ROR 

Rotate  right 

ROL 

Rotate  left 

ASR 

Arithmetic  shift  right 

ASL 

Arithmetic  shift  left 

ADC 

Add  carry 

SBC 

Subtract  carry 

SXT 

Sign  extend 
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Block  Transfer  Instructions 

DATAI 

Read  into  data  memory 

DATAO 

Write  from  data  memory 

INSTI 

Read  into  instruction  memory 

INSTO 

Write  from  instruction  memory 

INTDI 

Read  into  data  memory  and  interrupt 

INTDO 

Write  from  data  memory  and  interrupt 

INTII 

Read  into  instruction  memory  and  interrupt 

INTIO 

Write  from  instruction  memory  and  interrupt 

Branch  Instructions 

BR 

Branch 

BNE 

Branch  if  not  equal 

BEQ 

Branch  if  equal 

BGE 

Branch  if  greater  than  or  equal 

BIT 

Branch  if  less  than  or  equal 

BGT 

Branch  if  greater  than  zero 

BLE 

Branch  if  less  than  or  equal 

BPL 

Branch  if  positive 

BMI 

Branch  if  negative 

BHI 

Branch  if  higher  (unsigned) 

BLOS 

Branch  if  lower  or  sanie  (unsigned) 

BVC 

Branch  if  overflow  clear 

BVS 

Branch  if  overflow  set 

BHIS/BCC 

Branch  if  higher  or  same  (unsigned)  / carry  set 

BLO/BCS 

Branch  if  lower  (unsigned)  / carry  set 

State  Reaister  Instructions 

STn 

Store  register  n into  data  memory 

LDnO 

Load  register  n from  data  memory 

LDn 

Load  register  n from  literal 

Multiolv-fetch  Instructions 

MACH 

Add  high  part  of  product 

MACL 

Add  low  part  of  product 

MSTH 

Store  high  part  of  product 

MSTL 

Store  low  part  of  product 

Lood  Control  Instructions 

AOBl 

Increment  XI  and  branch  if  non-zero 

AOB2 

Increment  X2  and  branch  if  non-zero 

SOBl 

Decrement  XI  and  branch  if  non-zero 

SOB2 

Decrement  X2  and  branch  if  non-zero 
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APPENDIX  B 


Addressing  Modes 


The  following  table  lists  the  meaning  of  the  Addressing  Mode  Register  (AORM). 
This  register  is  divided  into  two  4-bit  fields  called  the  source  mode  and  the  destination 
mode.  Each  field  selects  a pair  of  addressing  modes  that  a source  or  destination 
operand  can  choose  from.  The  high  bit  of  the  instruction  address  field  selects  one  of 
the  pair. 

D The  low  6 bits  directly  address  the  operand. 

X 1 The  low  6 bits  are  added  to  index  register  1 to  obtain  the  address. 

X2  The  low  6 bits  are  indexed  with  index  register  2. 

XI  + The  low  6 bits  are  indexed  with  XI,  which  is  then  incremented. 

X2-f  The  low  6 bits  are  indexed  with  X2,  which  is  then  incremented. 

XI-  The  low  6 bits  are  indexed  with  XI,  which  is  then  decremented. 

X2-  The  low  6 bits  are  indexed  with  X2,  which  is  then  decremented. 
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[AP-120B] 

[BLISS] 

[FDP] 

[LDVT] 

[MECL] 

[SPS-41] 
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